3 research outputs found

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    DSToolkit: An architecture for flexible Dataspace Management ⋆

    No full text
    Abstract. The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-yougo’ approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now. Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexibility provided by the approach for the management of schemas represented in heterogeneous models, supports the complete dataspace lifecycle, which includes automatic initialisation, maintenance and improvement of a dataspace, and allows the user to provide feedback by annotating result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, selecting and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one

    Conjoint Mining of Data and Content with Applications in Business, Bio-medicine, Transport Logistics and Electrical Power Systems

    No full text
    Part 1: KeynoteInternational audienceDigital information within an enterprise consists of (1) structured data and (2) unstructured content. The structured data includes enterprise and business data like sales, customers, products, accounts, inventory and enterprise assets, etc. while the content includes contracts, reports, emails, customer opinions, transcribed calls, on-line inquires, complements and complaints. Further, cutting edge businesses also using GPS tracking or surveillance monitors as well as sensor technologies for productivity, performance and efficiency measures, and these are provided by outsourcers etc. Similarly in the Biomedical area, resources can be structured data say in Swiss- Prot or unstructured text information in journal articles stored in content repositories such as PubMed. The structured data and the unstructured content generally reside in entirely separate repositories with the former being managed by a DBMS and the latter by a content manager frequently provided by an outsourcer or vendor [76]. This separation is undesirable since the information content of these sources is complementary. Further, each outsourcer or vendor keep the data on their own Cloud, and data are not sharable between the vendor systems, and most vendor system were not integrated with the enterprise systems, and leaves the organization to consolidate the data and information manually for data analytics. Effective knowledge and information use requires seamless access and intelligent analysis of information in its totality to allow enterprises to gain enhanced critical insights. This is becoming even more important, as the proportion of structured to unstructured information has shifted from 50-50 in the 1960s to 5-95 today [1]. Unless we can effectively utilize the unstructured content conjointly with the structured data, we will only obtain very limited and shallow knowledge discovery from an increasingly narrow slice of information. The techniques developed in our research will then be used to address significant issues in three application areas, but potential applications with significant impact are much more extensive
    corecore